We proposed a probabilistic algorithm to solve the Multiple SequenceAlignment problem. The algorithm is a Simulated Annealing (SA) that exploitsthe representation of the Multiple Alignment between $D$ sequences as adirected polymer in $D$ dimensions. Within this representation we can easilytrack the evolution in the configuration space of the alignment through localmoves of low computational cost. At variance with other probabilisticalgorithms proposed to solve this problem, our approach allows for the creationand deletion of gaps without extra computational cost. The algorithm was testedaligning proteins from the kinases family. When D=3 the results are consistentwith those obtained using a complete algorithm. For $D>3$ where the completealgorithm fails, we show that our algorithm still converges to reasonablealignments. Moreover, we study the space of solutions obtained and show thatdepending on the number of sequences aligned the solutions are organized indifferent ways, suggesting a possible source of errors for progressivealgorithms.
展开▼
机译:我们提出了一种概率算法来解决多重序列比对问题。该算法是模拟退火(SA),它利用$ D $序列之间的多重比对表示为$ D $维中的定向聚合物。在此表示中,我们可以通过低计算成本的局部移动轻松跟踪路线的配置空间中的演变。与提出解决该问题的其他概率算法不同,我们的方法允许创建和删除缺口而无需额外的计算成本。测试了该算法以比对激酶家族的蛋白质。当D = 3时,结果与使用完整算法获得的结果一致。对于完全算法失败的$ D> 3 $,我们证明了我们的算法仍然收敛于合理的对齐方式。此外,我们研究了获得的解的空间,并表明,根据排列的序列数,解的组织方式不同,这提示了渐进算法的可能错误来源。
展开▼